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ABSTRACT 

This  paper  investigates  the  conditions  under  which  a 
discrete  optimization  problem  can  be  formulated  as  a  dynamic  pro- 
gram. Following  the  terminology  of  (Karp  and  Held  1967),  a 
discrete  optimization  problem  is  formalized  as  a  discrete  deci- 
sion problem  and  the  class  of  dynamic  programs  is  formalized  as  a 
sequential  decision  process.  Necessary  and  sufficient  conditions 
for  the  representation  in  two  different  senses  of  a  discrete  de- 
cision problem  by  a  sequential  decision  process  are  established. 
In  the  first  sense  (a  strong  representation)  the  set  of  all  op- 
timal solutions  to  the  discrete  optimization  problem  is  obtain- 
able from  the  solution  of  the  functional  equations  of  dynamic 
programming.  In  the  second  sense  (a  weak  representation)  a 
nonempty  subset  of  optimal  solutions  is  obtainable  from  the  solu- 
tion of  the  functional  equations  of  dynamic  programming.  It  is 
shown  that  the  well  known  principle  of  optimality  corresponds  to 
a  strong  representation.  A  more  general  version  of  the  principle 
of  optimality  is  given  which  corresponds  to  a  weak  representation 
of  a  discrete  decision  problem  by  a  sequential  decision  process. 
We  also  show  that  the  class  of  strongly  representable  discrete 
decision  problems  is  equivalent  to  the  class  of  sequential  deci- 
sion prcesses  which  have  cost  functions  satisfying  a  strict  mono- 
tonicity  condition.  Also  a  new  derivation  is  given  of  the  result 
that  the  class  of  weakly  representable  discrete  decision  problems 
is  equivalent  to  the  class  of  sequential  decision  processes  which 
have  a  cost  function  satisfying  a  monotonicity  condition. 


1 .   Introduction 

Dynamic  programming  has  proven  to  be  one  of  the  principal 
methods  for  the  formulation  and  solution  of  discrete  optimization 
problems.  A  number  of  studies  have  explored  the  extent  to  which 
dynamic  programming  is  applicable  to  such  problems,  including 
(Mitten  1964,  Held  and  Karp  1967,  Elmaghraby  1970,  Bonzon  1970, 
Ibaraki  1972,1973,  and  other  cited  in  the  references).  A  recent 
survey  of  solution  techniques  and  applications  of  dynamic  pro- 
gramming appears  in  (Morin  1978).  Mitten  was  the  first  to  point 
out  the  essential  role  that  the  monotonicity  of  the  cost  function 
plays  in  a  dynamic  program.  Subsequently,  (Held  and  Karp  1967) 
studied  dynamic  programs  in  terms  of  a  finite  state  machine  with 
a  superimposed  cost  structure  (an  sdp  as  defined  below),  and 
attacked  the  problem  of  characterizing  the  representations  of  a 
discrete  optimization  problem  by  a  sdp  with  a  monotonic  cost 
function. 

In  this  paper  the  notion  of  a  discrete  optimization  problem 
is  formalized  as  a  discrete  decision  problem  (ddp)  and  the  gen- 
eral setting  within  which  the  functional  equations  of  dynamic 
programming  can  be  applied  is  formalized  as  a  sequential  decision 
process  (sdp)  following  along  the  general  lines  of  (Karp  and  Held 
1967).  Necessary  and  sufficient  conditions  for  the  representa- 
tion in  two  different  senses  of  a  ddp  by  a  sdp  are  established  in 
theorems  2  through  7.  In  the  first  sense  (a  strong  representa- 
tion) the  set  of  all  optimal  solutions  to  the  discrete   optimiza- 
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tion  problem  is  obtainable  from  the  solution  of  the  functional 
equations  of  dynamic  programming.  In  the  second  sense  (a  weak 
representation)  a  nonempty  subset  of  optimal  solutions  is  obtain- 
able from  the  solution  of  the  functional  equations  of  dynamic 
programming.  It  is  shown  that  the  well  known  principle  of 
optimality  corresponds  to  a  strong  representation.  A  more  gen- 
eral version  of  the  principle  of  optimality  is  given  which 
corresponds  to  a  weak  representation  of  a  ddp  by  a  sdp.  It  is 
shown  that  sdp's  having  a  strictly  monotonic  cost  function  are  in 
one  to  one  correspondence  with  strong  representations  of  ddp's. 
Finally  a  new  derivation  is  given  of  the  result  that  sdp's  having 
a  monotonic  cost  function  are  in  one-to-one  correspondence  with 
weak  representations  of  a  ddp. 

Our  notion  of  a  weak  representation  is  new  in  that  we  nei- 
ther require  all  optimal  solutions  nor  the  correct  cost  of  the 
optimal  solutions,  but  are  satisfied  with  some  optimal  solutions. 
Presumeably  if  the  correct  costs  were  required,  one  could  compute 
the  cost  of  an  optimal  solution  using  the  cost  function  of  the 
ddp  after  they  have  been  found  by  some  method.  The  notion  of 
strong  representation  was  introduced,  along  with  an  even  stronger 
sense  of  representation,  in  (Ibaraki  1972). 

2.   Definitions. 

A  discrete  decision  problem  is  intended  as  a  general  model 
of   combinatorial   optimization   problems.    A   discrete  decision 
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problem  is  a  system  D=(A,S,P,f)  where 

A  is  a  finite  nonempty  alphabet   (set   of   primitive   deci- 
sions) , 

SCA   (set  of  feasible  policies), 

P  is  a  set  (the  set  of  data  inputs  for  the  problem) , 
f:SxP->R  where  R  is  the  set  of   positive   reals,   (cost   or 
objective  function) . 

An  instance  of  a  discrete  decision  problem  D,  denoted  D(p), 
is  given  by  a  particular  data  input  pGP.  A  policy  sGS  is  optimal 
with  respect  to  input  pGP  if  VtGS  f (s ,p) <f ( t ,p) .  The  set  of 
optimal  policies  for  the  problem  instance  D(p)  is  denoted  0(D,p). 
We  will  be  interested  in  the  conditions  under  which  the  problem 
of  finding  0(D,p)  or  a  subset  of  0(D,p)  can  be  formulated  by  a 
dynamic  program. 

One  of  the  simplest  discrete  decision  problems  is  the  prob- 
lem of  finding  the  least  cost  path  from  the  start  node  to  a  goal 
node  in  an  arc-weighted  directed  graph.  This  problem  can  be 
represented  as  a  ddp  as  follows;  let  A  be  the  set  of  arcs  (i,j) 
in  the  graph  where  (i,j)  represents  the  decision  to  move  from 
node  i  to  node  j,  S  is  then  the  set  of  sequences  of  arcs  which 
move  from  the  start  node  to  a  final  node,  P  is  the  set  of  cost 
matrices  (p.-  ±)  where  p,-  j;  is  the  cost  of  arc  (i,j),  and  finally 
f(s,p)  is  the  cost  of  arc  sequence  (path)  s  with  respect  to  input 

p;  more  precisely,  f(s,p)  =     T       p.-  ^. 

(i,  j)Gs  1,J 
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The  functional  equations  of  dynamic  programming  apply  to  a 
kind  of  process  called  a  sequential  decision  process.  A  sequen- 
tial decision  process  (sdp)  is  a  system  11=  (A,Qf q0 , Qf ,T ,h ,k , P) 
where 

A  is  a  finite  nonempty  alphabet  (set  of  primitive  deci- 
sions) , 

Q  is  a  set  (set  of  states) , 

qnGQ  (start  state) , 

QfCQ  (set  of  final  states) , 

t:QxA-»Q  (transition  function), 

h:RxQxAxP-»R  (cost  or  objective  function) , 

k:P-»R  (initial  cost  function), 

P  is  a  set  (input  data  specifications). 

The  transition  function  t  applies  a  decision  aGA  to  a  state  qGQ 
resulting   in   a  transition  to  a  new  state  t(q,a).   We  can  extend 

is 

the  domain  of  t  to  QxA  by  the  following  recursive  definition: 
let  t(q,e)=q  for  qGQ,  where  e  is  the  empty  sequence, 
t  (q,xa) =t (t (q,x) ,a)  for  qGQ,  xGA  ,  and  aGA.  Thus  t(q,xa)  is  the 
state  resulting  from  applying  the  decision  sequence  xa  to  the 
initial  state  q.  When  only  one  argument  is  given  to  t  the  path 
will  be  assumed  to  originate  at  the  start  state,  thus  t(x)  is  the 
state  resulting  from  applying  the  decision  sequence  x  from  the 
start  state.  Let  F  (II)  ={x  1 1  (x)SQ-)  .  xGF(II)is  a  feasible  decision 
sequence  which  t  maps  (by  definition)  from  q0  to  some  final  state 
qfGOf.  Note  that  the  first  five  components  of  a  discrete  deci- 
sion problem  comprise  a  finite  state  automaton  (Hopcroft  and  Ull- 
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man  1969).  The  cost  function  h(c,q,a,p)  is  the  cost  of  reaching 
state  t(q,a)  by  a  sequence  reaching  state  q  with  cost  c  which  is 
extended  by  decision  a.  The  initial  cost  function  k (p)  is  the 
cost  of  a  null  sequence  given  input  p.  It  will  be  useful  to  con- 
sider the  special  case  of  decision  sequences  applied  to  the  start 
state  as  follows:  let  g(e,p)=k(p),  g(xa,p)  =  h (g  (x,p)  , t (x) , a ,p) 
for  x6A  ,  aGA,  pGP.  Thus  g(x,p)  gives  the  cost  of  reaching  state 
t(x)  from  qg  by  means  of  the  sequence  of  decisions  x.  Finally 
since  we  are  interested  in  optimal  decision  sequences  let  us 
define  (and  assume  the  existence  of)  G(q  ,p)=k(p)   and   G(q,p)   = 

min      g(x,p)   for  all  q^q_,  p6P,  thus  G(q,p)  is  the  cost  of 
{x|t(x)=q}  b 

the  least  cost  decision  sequence  reaching  state  q   from   qg.    We 

* 
say   xSA    is   an   optimal   decision  sequence  reaching  state  c[  if 

t(x)=q  and  G  (q, p) =g (x , p)  .   The  set  of  optimal  decision   sequences 

reaching  a  final  state  of  II  are  denoted  O(ITfP).   Note  that  O(ITfP) 

is  always  nonempty   since   there   is   at   least   one   least   cost 

sequence   reaching   each   final  state  of  T[.      A  sdp  IT  represents  a 

ddp  D  if  F(II)=S  and  0  (Ilf  P)  CO  (D,p)  . 

3 .   Representations  of  a^  discrete  decision  problem . 


Before  turning  to  our  primary  problem  of  characterizing  the 
representations  of  a  ddp  by  a  dynamic  program,  we  give  necessary 
and  sufficient  conditions  for  the  representation,  as  defined 
above,  of  a  ddp  by  an  sdp.  We  first  summarize  some  concepts  and 
results  on  finite  automata  (Hopcroft  and  Ullman  1969)  which   will 


-5- 


be  needed  only  in  the  present  section.  The  equiresponse  relation 
of  a   finite   automaton   is   defined   by   the   relation   xRy   iff 

t(x)=t(y)   for   all   x,yGA  .    An  equivalence  relation  R  on  A   is 

* 

called  right  invariant  if  xRy  ->  (Vz€A  )xzRyz.   If  R   and   T   are 

equivalence  relations  on  A  then  R  refines  T  if  Vx,yGA  xRy  -> 
xTy.  An  equivalence  relation  has  finite  rank  if  it  has  only  a 
finite  number  of  equivalence  classes.  Note  that  the  equiresponse 
relation  on  a  finite  automaton  is  right  invariant  since  t(x)=t(y) 
-»  t(xz)  =  t(t(x),z)  =  t(t(y),z)  =  t(yz).  Finally  for  some  SCA 
define  the  equivalence  relation  Rg  as  follows: 

xRgy  iff  (VzSA*)  xzSS  <->   yzGS . 
The  following  lemma  gives  us   an   essential   property   of   finite 
automata . 

Proposition  1.  Let  SCA  and  let  R  be  a  riqht  invariant 
equivalence  relation  of  finite  rank,  then  R  is  the  equiresponse 
relation  of  a  finite  automaton  which  accepts  S  iff  R  refines  R~ . 

proof:  see  (Hopcroft  and  Ullman  1969;  pp  29). 


Theorem   1.    A   sdp  U=  (A,Q,qQ ,Qf ,T , h ,k , P)   represents    a    ddp 
D=(A,S,P,f)  iff  the  following  conditions  hold: 

1.  the  equivalence  relation  R  defined  by  xRy  iff  t(x)=t(y)  for 

* 
x,y6A    is  a  right  invariant  equivalence  relation  of  finite 

rank  which  refines  Rg. 

2.  (VpGP)  (Ex  s.t.  t(x)6Qf)(Vy  s.t.  t(y)6Qf)   g (y , p) <g  (x , p)   -> 
yGO(D,p) . 
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proof:  (if):  Suppose  that  conditions  1  and  2  hold.  By  proposi- 
tion 1,  R  is  the  equiresponse  relation  of  a  finite  automaton 
which  accepts  the  language  S,  so  F(TI)=S.  Let  x  satisfy  condition 
2,  so  (VyeS  s.t.  t(y)GQf)  g (y ,p) <g (x , p)  -»  yGO(D,p).  Let 
yeO(IT,P)  so  (Vy  s.t.  t(y)6Qf)  g(y,p)<g(y,p)  -»  g  (y  ,p)  <g  (x  ,p)  -» 
yeO(D,p)  thus  0(Il,p)CO(D,p). 

(only  if):  Suppose  now  that  IT  represents  D,  so  F(11)=S  and 
0  (IT,  p)  CO  (D  ,p)  .  R  is  the  equiresponse  relation  of  a  finite  auto- 
maton which  accepts  S,  so  R  is  a  right  invariant  equivalence 
relation  of  finite  rank.  By  proposition  1,  R  refines  Rg  ,  so  con- 
dition 1  holds.  Let  ySOfllrP)  then  (Vy  s.t.  t(y)6Qf) 
g(y,p)<g(y,p)  -»  g(y,p)  =  g(y,p)  ->  yeo(IlfP)  ->  yeo(D,p).  Thus 
condition  2  holds.   QED 

There  are  several  important  aspects  to  our  representations 
of  ddp's  by  sdp's  which  should  be  pointed  out.  In  mapping  from  a 
ddp  to  a  sdp,  we  assume  the  notion  of  a  state  (the  equivalence 
classes  of  R  in  theorem  1),  the  existence  of  the  transition  func- 
tion t  which  only  depends  on  the  current  state  and  input  deci- 
sion, and  a  cost  function  which  is  separable  in  the  sense  that 
the  cost  of  adding  a  transition  onto  the  end  of  a  sequence  only 
depends  on  the  current  state,  the  input  decision,  and  the  cost  of 
the  sequence  (in  general  the  cost  might  depend  on  all  previous 
decisions).  This  much  structure  is  implicit  in  the  concept  of  a 
dynamic  program.  A  closer  examination  of  these  assumptions  may 
be  found  in  (Elmaghraby  1970). 
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4 .   Strong  representations  of  a  discrete  decision  problem . 

Our  purpose  is  to  discover  the  conditions  under  which  a  sdp 
II  represents  a  ddp  D  by  means  of  a  discrete  dynamic  program.  The 
principal  underlying  dynamic  programming  has  been  formulated  by 
Bellman  in  the  Principle  of  Optimality  (Bellman  1957)  and  can  be 
paraphrased  as  follows: 

An  optimal  sequence  has  the  property  that  no  matter  what  the 
next-to-last  state  and  the  next-to-last  decision  are  the  sequence 
reaching  the  next-to-last  state  must  be  optimal. 

This  version  of  the  principle  of  optimality  is  illustrated 
in  figure  la.  If  for  aGA,  x€A  xa  is  an  optimal  sequence  from 
state  qn  to  q^  then  x  is  an  optimal  sequence  from  qQ  to  q.  In 
general  the  principle  of  optimality  implies  that  if  xy,  for 
x,y€A  ,  is  an  optimal  sequence  from  qQ  to  q^  then  x  is  an  optimal 
sequence  from  qn  to  t(qn,x)  and  y  is  an  optimal  sequence  from 
t(qn*x)  to  3f  as  illustrated  in  figure  lb.  This  illustration 
applies  only  to  discrete  sequences  and  so  should  not  be  construed 
to  demonstrate  the  full  range  of  dynamic  programming  which  is 
much  broader. 

In  terms  of  an  sdp  the  principle  of  optimality  can  be  made 
precise  as  follows: 

(Vp6P)  (VxGA*)  (VaGA)  G  ( t  (xa )  , p) =g  (xa , p)  ->  G  (t  (x)  , p) =g (x , p)  d) 
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Figure  1. 


The  following  lemma  states  an  equivalent  form  for  (1).  Let 
11= (A,Q,qn ,Qf ,T , h ,k , P)  be  a  sdp.  h  is  s ' -monotonic  if  for  all 
states  qGQ,  optimal  sequences  xa  reaching  state  q,  and  sequences 
ya  reaching  q,  we  have  g (x,p) <g (y ,p)  <->  g  (xa , p) <g (ya , p)  .  A  sdp 
containing  a  s'-monotonic  cost  function  is  a  s '-monotonic  sequen- 
tial decision  process  (s'-msdp).  We  say  h  is  strictly  monotonic 
(s-monotonic)  if  for  all  x,y€A  such  that  t(x)=t(y), 
g  (x,p) <g (y ,p)  -»  g  (xa  ,p)  <g  (ya  ,p)  .  A  sequential  decision  process 
which  contains  a  s-monotonic  cost  function  is  called  a  strictly 
monotonic  sequential  decision  process  (s-msdp) . 

Theorem  2.   (1)  holds  for  an  sdp  11=  (A, Q ,qQ ,Qf ,T ,h ,k , P)  iff   h   is 
s '-monotonic. 

proof:  (only  if):  Suppose  that  (1)  holds  for  some  sdp  IT  and 
that  h  is  not  s'-monotonic.  Let  xa  be  an  optimal  sequence  reach- 
ing state  q  and  let  y  be  a  sequence  such  that  t(x)=t(y).  Suppose 
first  that  g  (x, p) <g (y , p)  and  g  (xa , p) >g (ya ,p)  .  Since 
G (q, p)=g (xa ,p)>g (ya,p) ,   we   have    g  (xa ,p) =g (ya , p) .     By    (1), 
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G  (q' ,p)=g (x,p)=g (y,p) ,  but  this  contradicts  our  assumption  that 
g(x,p)<g(y,p) .  Thus  g(x,p)<g(y,p)  -»  g (xa ,p)<g (ya ,p) .  Suppose 
instead  we  have  g(xa,p)<g(yafp)  but  g  (x,  p)  >g  (y ,  p)  .  g  (x,  p)  ?*g  (y  ,p) 
since  g (xa ,p) ?q (ya ,p)  so  g (xf p) >g (y ,p) .  But  by  (1)  and  our 
assumption  that  xa  is  an  optimal  sequence  reaching  q,  we  have 
G  (q ' ,p)=g  (x,p) <g (y,p)  by  definition  of  G.  This  contradiction 
shows  that  g (xa , p) <g (ya ,p)  -»  g  (x,p)  <g  (y ,  p)  when  x  is  an  optimal 
sequence  reaching  state  q.   Thus  (1)  — >  h  is  s'-tnonotonic. 

(if):  Suppose  now  that  h  is  s ' -monotonic .  If  (1)  does  not 
hold  then  for  some  sequence  xa  such  that  t(xa)=q,  we  have 
G (q, p)=g (xa ,p)  but  G (q ' ,p) ?q (x , p)  where  t(q',a)=q.  For  some  yGA 
such  that  t(x)=t(y)  we  have  G (q * , p) =g (y ,p) <g (x, p) .  If 
g (ya ,p) =g (xa , p) =G (q,p)  then  h  is  not  s'-monotonic  (with  respect 
to  optimal  sequence  xa) ,  so  we  must  have  g (ya , p) >g (xa , p) .  But 
since  h  is  s'-monotonic  we  have  g (y , p) >g (x,p)  which  contradicts 
our  earlier  finding  that  g (y ,p) <g (x, p) .   Thus  (1)  must  hold.   QED 

In  practice  we  wish  to  find  optimal  policies  between  states. 
We  define  below  the  tables  T(q,p)  which  store  the  information 
necessary  to  obtain  optimal  policies.  Formally  for  all  qGQfpGP 
T(q,p)  is  a  subset  of  QxA.  (T: 0xP->2^xA) .  A  set  of  policies 
9(q,p)  are  obtainable  from  the  tables  T(q,p)  as  follows:  let 

e(c[SfP)  =  {(qsre)},     where  e  is  the  empty  string, 

©(q,P)  =  {ya| (q1 ,a)6T(q,p)  and  yee(q',p)}    for  q^qs. 

A  ddp  D=(A,S,P,f)  is  strongly- represented  (weakly- represented)  by 
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a  sdp  IT=  (A,Qfq0  ,Qf  ,T,h  ,k  ,P)  if  i)  IT  represents  D,  ii)  the  func- 
tional equations  (2)  and  (3)  given  below  hold  and  iii)  for  q€Q, 
p€P  the  set  of  policies  obtainable  from  the  tables  T(q,p)  is  the 
set    (subset)    of    all    optimal    policies;    in    particular 

U  9  (q,p)  =0  (HfP)   (    U  9  (q, p)C0  (IT/P)    for   a  weak  representa- 
q€Qf  q6Qf 

tion)  . 


G  (q«.,p)=k 


(2) 


G(q,p)  =        min        h  (G  (q ' , p)  ,q  '  , a ,p)  (3) 

{  (q'  ,a)  |t(q'  ,a)=q} 


T(q,p)={  (q'  ,a)  I t (q '  ,a)=q,  G (q , p) =h (G (q '  , p)  ,q '  , a ,p)  }     (4) 

Note  that  if  IT   strongly   (weakly)   represents   D   then   by   (i) 

0(IT,p)=0(D,p)   and  thus   U  9(qfp)  =  0(D,p)  (   U  9  (q ,  p)  CO  (D  ,p)  ) 

qSQf  q€Qf 

i.e.,  the  construction  of  the  tables  9  by  means  of  (2), (3),  and 
(4)  results  in  the  construction  of  all  (a  nonempty  subset  of) 
optimal  solutions  to  the  ddp  D. 

Lemma  _1.   xG9(q,p)  -»  x  is  an  optimal  sequence  reaching  state  q. 

proof:  the  lemma  follows  immediately  from  the  stronger  lemma  2 
which  is  given  in  the  appendix. 

We  do  not  require  that  an  optimal  sequence  have  the  same  cost  in 
D  as  in  IT-  Our  interest  is  in  obtaining  optimal  solutions  and  in 
making  use  of  the  functional  equations  (2)  and  (3).   These   equa- 
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tions  are  characteristic  of  dynamic  programming  and  are  often 
considered  a  direct  translation  of  the  principle  of  optimality. 
We  take  (1)  as  a  more  direct  translation  and  show  next  that  in 
the  sense  of  a  strong  representation  (1)  and  the  equations  (2) 
and  (3)  are  equivalent. 

Theorem  3_.   A  ddp  D=(AfS,Pff)  is  strongly-represented  by   an   sdp 
TI=  (A,Q,qQ  ,Qf  ,T,h,k  ,P)  iff  IT  represents  D  and  (1)  holds. 

proof:  (if):  Suppose  that  (1)  holds  and  II  represents  D.    In 

order   to   show   that  the  ddp  D  may  be  strongly-represented  by  an 

sdp  Ilf  we  must  show  that  II  represents  D  (which  we  have  assumed), 

(2)   and   (3)  hold,  and  that  all  optimal  policies  may  be  obtained 

from  the  tables  defined  by  (4).   First,  (2)  holds   by   definition 

of   G.    Let   H(q,p)   denote  the  right  hand  side  of  (3).   We  will 

show  that  G  (q,p) =H (q,p) .   Suppose  that  ya  is   an   optimal   policy 

reaching   state   q,   so   G (q, p) =g (ya ,p) .   Since  (1)  holds  we  then 

have  G (q,p)=g  (y,p)  where  t(q,a)=q.   Thus  G(q,p)  =  h (g  (y ,  p)  ,q , a ,p) 

=   h(G(q,p) ,q,a,p)  >        min        h (G (q  '  ,p)  ,q  ' , a ,p)  =H(q,p), 

-{  (q' ,a)  I t  (q '  ,a)=q} 

or  G(q,p)>H(q,p) . 

Now  let  H (q,p)=h (G (q,p) ,q,a,p)  for  some  q€Q  and  suppose 
G (q,p) =g  (y,p)  where  t(y)=q.  i.e.,  y  is  an  optimal  policy  reach- 
ing q.  Let  t(ya)=q  then  G (q, p) <g (ya ,p)  =  h (g (y , p) ,q , a ,p)  = 
h (G (q,p) ,q,a,p)  =  H(q,p),  thus  G  (q,p)  <H  (q,  p)  .  Combining  these 
results  we  have  G (q,p) =H  (q,p)  and  (3)  holds. 

By  lemma  1  all  policies  in  0(q,p)  are  optimal  with  respect 
to   h.    Suppose   though   that   not   all   optimal  policies  can  be 
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obtained  from  (4).  Let  xa  be  an  optimal  policy  of  shortest 
length  reaching  state  q  which  is  not  in  9(t(xa),p).  Let  t(x)=q'. 
By  (1)  x  is  optimal  thus  x€0(q*,p)  (since  x  has  shorter  length 
than  xa)  and  G (q ' , p) =g (x ,p) .  Since  xa  £  0(t(xa),p)  we  must  have 
G(t(xa),p)  <  h(G(q' ,p)  fq' ,a,p)  =  h (g  (x,p) ,q * ,a ,p)  =  g(xa,p),  but 
this  contradicts  our  assumption  that  xa  is  an  optimal  sequence 
reaching  state  q.  Therefore  (q ' , a) €T (q, p)  and  by  definition 
xa00(q,p),  so  0(q,p)  is  the  set  of  all  optimal  sequences  reaching 
state  q.   In  particular   U  0(q,p)  =  O(ITfP). 

qeQf 

(only  if):  Suppose  now  that  the  ddp  D  is  strongly- 
representable  by  the  sdp  IT.  For  some  q€Q,  x€A  we  are  able  to 
obtain  all  optimal  policies  reaching  state  q  using  (2),  (3),  and 
(4).  consider  xa£0(q,p)  where  t(xa)=q,  t(x)=q'.  By  lemma  1  xa 
is  an  optimal  sequence  reaching  state  q.  By  definition 
x60(q',p),  and  by  lemma  1  x  is  an  optimal  policy  reaching  q',  so 
G (q ' ,p) =g (x , p) .  Thus  (1)  holds.  II  represents  D  by  assumption. 
QED 

Corollary  1_.   A  ddp  D=(A,S,P,f)  is  strongly-represented  by  a   sdp 
11=  (A,  Q,  qQ  ,Qf:  ,T  ,h  ,k  ,  P)  iff  Vl   represents  D  and  TT  is  a  s'-msdp. 

proof:  immediate  from  theorems  2  and  3. 

The  s ' -monotonici ty  of  the  cost  function  of  an  sdp  is  an 
essential  ingredient  in  a  strong  representation  of  a  ddp.  It  can 
be  shown  however  that  any  s'-monotonic  cost  function  is  effec- 
tively  equivalent   to   some   str ictly-monotonic   cost   function. 
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Given  a  s'-monotonic  function  h,   define   the   function   g1   (and 
thereby  h'  implicitly)  as  follows: 


g' (xa,p)  = 


g  ( xa  ,  p ) 


if  G (q,p)=g (xa,p) 


(5) 


G (q,p)+g ' (x,p)   otherwise. 


Define  G'(q.p)  =      min    g'(x,p).    Note   that   by   definition 

{q|t(x)=q} 

G (q,p)=G ' (q, p)  for  all  states  q  and  inputs  p.  Lemma  4  given  in 
the  appendix  establishes  the  effective  equivalence  of  h  and  h*  in 
the  sense  that  the  set  of  optimal  sequences  obtained  for  each 
state  is  the  same  for  both  cost  functions. 

Lemma  3.  If  h  is  s'-monotonic  then  h'  defined  by  (5)  is  strictly 
monotonic. 

proof:  Let  h'  be  defined  from  the  s'-monotonic  function  h  by  (5). 
Suppose  for  x,y6A  such  that  t(x)=t(y),  we  have  g ' (x,p) <g ' (y , p) . 
We  have  2  cases  to  consider  in  order  to  show  that 
g  '  (xa rp) <g ' (ya ,p) .  Let  a€A  such  that  t(xa)=q.  Case  1:  ya  is  not 
optimal.  By  construction  of  g',  g ' (ya , p) =G (q, p) +g ' (y , p)  and 
g'(xa,p)  has  the  value  G(q,p)  or  G (q, p) +g ' (x, p)  either  of  which 
is  strictly  less  than  g'(ya,p).  Case  2:  ya  is  an  optimal 
sequence  reaching  state  q.  If  ya  is  optimal  then 
g '  (ya ,p) =g (ya ,p) =G (q, p) .  Also  by  theorem  2,  (1)  holds  so  y  is  an 
optimal  sequence;  i.e.,  g'(y,p)  =  g(y,p)  =  G(q'fp)  =  G'(q',p), 
but  this  contradicts  our  assumption  that  g *  (x,p) <g ' (y , p)  = 
G'  (q' ,p) .   OED 
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Theorem  4_.  A  ddp  D=(A,S,P,f)  is  strongly  represented  by  a  sdp 
IT= (A,Q,qn , Qf ,T,h , k ,P)  iff  there  is  a  strictly  monotonic  sdp 
IT  =  (A,Q,qn ,Qf ,T,h ' ,k ,P)  which  strongly  represents  D. 

proof:  (only  if):  Clearly  any  s-msdp  is  an  s'-msdp  so  by  corol- 
lary 1  the  statement  of  the  theorem  is  consistent  and  D  is 
strongly  represented  by  11'  • 

(if):  Suppose  that  D  is  strongly  represented  by 
11= (A,Q,qn ,Qf ,T,h ,k ,P) ,  then  by  corollary  1  h  is  a  s'-monotonic 
cost  function.  Consider  h'  defined  by  (5)  which  is  s-monotonic 
by  lemma  3.  We  need  to  show  that  TT1  = (A, Q , q0 ,Qf ,T ,h '  ,k , P) 
strongly  represents  D.  (2)  holds  by  definition.  In  order  to 
show  that  (3)  holds,  let  xa  be  an  optimal  sequence  reaching  state 
q.  By  construction  G (q,p) =G ' (q,p)  for  all  states  q9Q.  Equation 
(3)  then  holds  for  G1  since  it  holds  for  G  by  corollary  1.  Equa- 
tion (4)  holds  since  lemma  4,  given  in  the  appendix,  shows  that 
0  *  (q,p) =9  (q,p)  so  9'(q,p)  is  the  set  of  all  optimal  sequences 
reaching  state  q.   Finally  IT'  represents  D  since  F  fTP  )  =F  (II)  =S  and 

0(D,p)=0(II,P)  =   U  9(q,p)  =   U  9'(q,p)  =0(IT,p).   QED 

q€Qf  q9Qf 

5 .   Weak  representations  of  a  discrete  decision  problem . 

We  have  been  looking  at  the  conditions  under  which  we  can 
find  all  optimal  decision  sequences  reaching  any  state  from  qQ. 
In  practice  we  may  relax  this  requirement  and  be  satisfied  with 
some  (or  just  one)  optimal  sequences  to  each  state  in  Q.  We  now 
explore  the  conditions  under  which  this  requirement  can  be  satis- 
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fied. 

We  have  seen  how  a  direct  translation  of  the  principle  of 
optimality  helped  to  establish  the  conditions  for  its  applica- 
tion. In  the  more  general  situation  faced  now  it  may  be  helpful 
to  give  a  generalized  principle  of  optimality  which  applies  when 
we  are  interested  in  obtaining  only  some  optimal  decision 
sequences. 

Generalized  principle  of  optimality  (forward  version) :  If  there 
is  an  optimal  sequence  reaching  state  q,  then  there  is  an  optimal 
sequence  reaching  state  q  with  the  property  that  no  matter  what 
the  last  decision  and  last  state  q'  were,  the  sequence  reaching 
q'  is  an  optimal  sequence. 

Given  p6P,  a  sequence  xa  is  1-optimal  if  G (t (xa) ,p) =g  (xa ,p)  and 
G  (t  (x) ,p) =g (x, p) .  This  generalized  principle  of  optimality  can 
be  formalized  as  follows: 

(Vp€P)  (VqSQ)  there  is  a  1-optimal  sequence  reaching  state  q  (*>) 

In  these  terms  we  can  reformulate  the  (original)  principle  of 
optimality  as  follows:  Vp6P  VqGQ  every  optimal  sequence  reaching 
state  q  is  1-optimal.  Condition  (6)  can  be  expressed  soley  in 
terms  of  the  cost  function  h  as  given  below  in  theorem  5.  h  is 
b-monotonic  if  for  all  q€Q,  some  optimal  sequence  xa  reaching  q, 
and  sequence  yaSA  reaching  q,  we  have  g  (xa  ,p)  <g  (ya  ,p)  -> 
g(xfp)<g(y,p)  .  A  sdp  11=  (A,  Q,  q0  ,  Qf  ,T  ,h  ,k  ,  P)  in  which  h  is  b- 
monotonic  is  a  b-monotonic  sequential  decision  process  (b-mdsp) . 
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Theorem  5^.   (6)  holds  iff  h  is  b-monotonic. 

proof:  (if):  Consider  an  arbitrary  state  q€Q  and  let  h  be  b- 
monotonic.  We  will  show  there  exists  a  1-optimal  sequence  reach- 
ing state  q.  Let  xa  be  an  optimal  sequence  reaching  state  q. 
Let  P(q)  denote  the  set  of  sequences  such  that  yGP(q)  iff  t(y)=q. 
Partition  P(t(x))  into  two  sets  as  follows:  let 

Y(x,a)  =  {y |yGP(t (x) )  ,  g  (xa ,p) =g  (ya , p) ,  g (x,p) >g (y , p) } 
Z(x,a)  =  {z|z€P(t (x) ) ,  g  (xa,p)<g (za,o) }  U 

{z  I  zGP(t (x) )  ,  g (xa fp) =g  (za,p)  ,  g  (x,  p)  <g  (z  ,  p)  } 
For  any  z6Z(x,a)  we  have  g (x , p) <g (z , p) ,  either  by   the   monotoni- 
city   of   h   in  the  case  that  g  (xa  ,p)  <g  (za  ,p)  or  by  definition  in 
the  other  case.   Thus  if  Y(x,a)  is   empty   then   G  (t  (x)  , p) =g (x , p) 
and   xa   is   a  1-optimal  sequence  reaching  state  q.   On  the  other 

hand  if  Y(x,a)  is  nonempty,  we  have   y'=   min   g(y»p)   for   some 

ySY(x,a) 

y'6Y(x,a).  Then  g (y * ,p) <g (y , p)  for  all  y€Y(x,a),  and 
g (y • ,p) <g (x,p) <g (z,p)  for  all  zGZ(x,a),  thus  G  ( t  (x)  , p) =g (y ' ,p) . 
But  g  (y ' a ,p) =g  (xa ,p) =G (q , p) ,  so  y'a  is  a  1-optimal  sequence 
reaching  state  q. 

(only  if):  Suppose  now  that  ("S)  holds.  For  an  arbitrary 
state  q,  let  G (q, p) =g (xa ,p)  and  G (q ' , p) =g (xf p)  where  t(q',a)=q 
and  t(x)=q*;  i.e.,  xa  is  1-optimal  sequence  reaching  state  q. 
Suppose  that  h  is  not  b-monotonic,  so  for  some  sequence  ya  we 
have  g (xa ,p) <g (ya ,p)  and  g  (x  ,  p)  >g  (y ,  p)  .  By  the  1-optimality  of 
xa  we  have  g (x, p) =G (q ' ,p)<g (y , p) .  Furthermore  we  must  have 
g (x,p) <g (y, p)  since  g (x,p) =g (y ,p)  -»  h (g (x , p) , t (x) , a ,p) 
h (g (y , p) , t (x) ,a ,p) ;   i.e.,   g (xa ,p) =g (ya ,p) .    This  contradiction 
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shows  that  h  is  b-monotonic.   QED 

Theorem  6.  A  ddp  D«(A,S,P,f)  is  weakly-represented  by  a  sdp 
I!=(AfQ,q0,Qf ,T,h,krP)  iff  II  represents  D  and  (6)  holds. 

proof:  (if):  Suppose  that  the  ddp  D=(A,S,P,f)  is  weakly- 
represented  by  a  sdp  11=  (A,Q,q0  ,Qf  ,T,  h,  k  ,P)  .  By  definition  I! 
represents  D.    Now   let   q   be   an   arbitrary   state.    By   (2)  , 

G(q,p)=        min        h  (G  (q  '  ,  p)  ,  q  '  ,  a  ,  p)  .      Let    G(q,p) 
{(q\a) |t(q',a)=q} 

h(G(§,p) ,q,a,p)    and    let   G (§ ,p) =g (y ,p) ,    then    G(q,p) 
h(G  (q,p) ,§,a,p)   =  h  (g  (y  ,p)  ,§ ,  a  ,p)  =  g(ya,p).   We  have  just  shown 
that  ya  is  a  1-optimal   sequence   reachinq   state   q.    Thus   (6) 
holds. 

(only  if):  Suppose  now  that  IT  represents  D  and  (6)  holds. 
For  any  state  q6Q,  there  exists  a  sequence  xa  such  that  t(xa)=q, 
G(q,p)=g (xa,p) ,  and  G (§ ,p) =g (x,p) .  G(q,p)  =  g(xafp) 
h (g (x, p) , q ,a , p)  =  h (G (q, p) , § , a , p)  which  implies  that  we  can  find 
the  value  G(qfp)  by  minimizing  the  expression  h (G (q ' rp) ,q ' ,a,p) 
over  all  q'€Q,  a6A  such  that  t(q'fa)=q,  and  thus  we  get  (3).  (2) 
follows  by  definition.  By  definition  all  elements  of  0(q,p)  are 
optimal  sequences  which  reach  state  q.  To  see  that  9(q,p)  is 
nonempty,  note  that  since  (6)  holds  there  is  a  sequence  xa  such 
that  G  (q,p)=g (xa ,p)  and  G (q ' , p) =g (x,p)  where  T(q'fp)=q  and  by 
definition  such  an  xa  is  in  9(q,p).  Finally  ft  represents  D  by 
assumption.   QED 
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Corollary  2.   A  ddp  D=(A,S,P,f)  is  weakly-representable  by  a   sdp 
TI55  (A,Q,q0 ,  Qf  ,T,h,  k  ,  P)  iff  ft  represents  D  and  II  is  a  b-msdp. 

proof:  immediate  from  theorems  5  and  6. 

We  have  now  characterized  the  classes  of  sdp's  which  weakly 
and  strongly  represent  ddp's.  The  difference  between  these  two 
types  of  representations  is  illustrated  in  figure  2.  Here  h  is 
b-monotonic  but  h  is  not  s'-monotonic.  According  to  equation 
(3),  in  order  to  determine  an  optimal  sequence  reaching  q,  we 
consider  an  extension  of  an  optimal  sequence  reaching  q1.  But  in 
restricting  the  search  to  optimal  sequences  reaching  q',  equation 
(3)  overlooks  the  optimal  sequence  ya  reaching  q.  This  illus- 
trates why  b-msdp's  can  only  weakly-represent  a  ddp. 

The  conditions  established  for  the  weak-representation  of  a 
ddp  are  necessary  in  order  to  take  care  of  fairly  pathological 
cost  functions.  It  can  be  shown  however  that  the  cost  function 
of   any   sdp   which  weakly  represents  is  equivalent  to  other  cost 


g  (x,p)=10  g  (xa ,p)=16 
g(y,p)=12  g(yafp)=16 


%     % 


Figure  2. 
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functions  with  nicer  properties.  Given  a  cost  function  h  which 
is  b-monotonic,  define  the  function  g*  (and  thereby  h')  as  fol- 
lows : 

g'(x,p)=i"  (7) 


p3(x,p)   if  x  is  1-optimal 
|G(t(x),p)+l   otherwise. 


Define  G'(q,p)=  min  g'(x,p).   Lemma   4   given   in   the   appendix 

t(x)=q 

establishes  the  effective  equivalence  of  h  and  h'  in  the  sense 
that  the  set  of  optimal  sequences  obtained  for  each  state  is  the 
same  for  both  cost  functions. 

h   is   monotonic   if   Vx,y€A  VaGA   such    that    t(x)=t(y) 

g (x, p) <g (y , p)   ->   g (xa , p) <g (ya , p) .  An  sdp  with  cost  function  h 

which  is  monotonic  is  a  monotonic  sequential  decision  process 
(m-sdp) . 

Lemma  5^.  If  for  some  sdp  11=  (A,Q,qn  ,Qf  , T, h  , k  ,P)  h  is  b-monotonic 
then  h'  defined  by  (7)  is  monotonic. 

proof:  Consider  the  function  h'  defined  in  (7).  h'  can  be  shown 
to  be  monotonic  as  follows.  Let  t  (x) =t (y) =q ' ,  t(q',a)=q  and 
g ' (x,p) <g ' (y,p) .  If  xa  is  1-optimal  then  g ' (xa , p) =g (xa ,p) =G (q, p) 
and  since  g'(ya,p)  has  the  value  G(q,p)  or  G(q,p)+1, 
g ' (xa ,p) <g' (ya  ,p)  .  Suppose  now  that  ya  is  1-optimal,  then 
G (q,p)=g (ya,p)  and  G (q ' fp) =g (y , p) ,  g ' (ya ,p) =g (ya , p)  and 
g' (y,p)=g(y,p)=G (q' ,p)=g' (x,p)  (since  g ' (x, p) <g ' (y ,p) .  But  if 
g'  (x,p)=g' (y,p)  then  g'(xa,p)  =  h1  (g ' (xrp) ,q ' ,a,p) 
h'  (g'  (ya,p)  ,q' ,a,p)   =   g'(y,p)   (thus   g ' (xa ,p)<g ' (ya , p) ) .     If 
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neither   xa   nor  ya  is  1-optimal  then  g • (xa ,p) =g ' (ya ,p) =G (q,p) +1 . 
In  all  cases  the  monotonicity  of  h'  is  shown.   QED 

The  following  result  is  well  known  (Elmaghraby  1970,  Bonzon 
1970)  in  the  sense  that  dynamic  programs  are  in  one  to  one 
correspondence  with  monotonic  sdp's.  However  to  the  author's 
knowledge  it  has  not  been  pointed  out  that  m-sdp's  can  only 
weakly  represent  a  ddp;  i.e.,  one  is  not  guaranteed  to  be  able  to 
obtain  all  optimal  solutions  from  a  representation  by  a  m-sdp. 

Theorem  7.  A  ddp  D=(A,S,P,f)  is  weakly-represented  by  some  sdp 
11= (A,Q,qQ,Qf ,T,h,k ,P)  iff  there  is  a  m-sdp 
TT'= (A,Q,qQ ,Qf ,T,h ' ,k ,P)  which  weakly-represents  D. 

proof:  (if):  We  must  show  that  a  m-sdp  can  represent  D.  Let  xa 
be  an  optimal  sequence  reaching  q,  so  G (q,p) =g (xa ,p) .  Suppose 
g  (xa  ,p)  <g  (ya  ,p)  yet  g  (x,p)  >_g  (y  ,p)  .  By  the  monotonicity  of  h1,  we 
get  g (xa ,p) >g  (ya ,p)  which  contradicts  our  assumption.  Thus 
g (x,p) >g (y ,p)  and  h'  is  b-monotonic.  By  corollary  2,  TT1  weakly- 
represents  D. 

(only  if):  Suppose  that  D  is  weakly-represented  by  an  sdp 
11= (A,Q,q0,Qf ,T,h,k ,P) ,  and  h'  is  defined  by  (7)  from  h,  then  by 
corollary  2,  h  is  b-monotonic  and  by  lemma  5  h'  is  monotonic. 

We  can  show  that  D  is  weakly-represented  by  the  sdp 
n,= (A,Q,qQ,Qf ,T,h ' ,k,P) .  (2)  holds  by  definition.  Let  xGA*  be  a 
1-optimal  sequence  reaching  state  q€Q  so  G (q ,p) =g (x,p) .  Such  a 
sequence   exists  by  theorem  6.   By  construction  g '  (x,p) =g  (x,p)  so 
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G ' (q,p)»G (q,p)  for  all  states  qGQ.   Equation  (3)   must   hold   for 

G' (q,p)   since   it   holds   for  G(q,p)  as  a  result  of  corollary  2. 

Lemma  4  shows  that  9  (q, p) =9 •  (q, p)  so  9'(q,p)  is  a  nonempty  subset 

of  optimal  sequences.   Finally  IT'  represents  D  since  F  (IT1  )  =F  (II)  =S 

and  0(11', P)  =   U  9'(q,p)  =   U  9(q,p)  =  0(H,p)  C  0(Dfp).   QED 
qeQf  q6Qf 

6 .   Conclusion. 

This  paper  has  given  necessary  and  sufficient  conditions  for 
the  strong  and  weak  representation  of  a  discrete  decision  problem 
by  a  sequential  decision  process.  Strictly  monotonic  (monotonic) 
sequential  decision  processs  have  been  shown  to  be  equivalent  in 
the  strong  (weak)  representation  sense  to  the  class  of  discrete 
decision  problems  which  can  be  formulated  as  discrete  dynamic 
programs.  We  have  shown  that  the  problems  to  which  the  principle 
of  optimality  applies  are  a  subclass  of  the  problems  to  which  the 
functional  equations  of  dynamic  programming  are  applicable. 

Appendix 

In  order  to  establish  lemma  1  we  will  need  the  following  defini- 
tion and  lemma.  We  say  xGA  is  completely-optimal  if  every  ini- 
tial segment  (every  y€A  such  that  there  exists  z€A  such  that 
yz=x)  y  of  x  is  1-optimal. 

Lemma  2.      xa69(q,p)  iff  xa  is  completely  optimal. 

proof:  by  induction  on  the  length  of  a  sequence.   Let  the   length 
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of  x  be  1,  i.e.,  x9A.  (qg , x) ee (q,p)  iff  x6T(q,p)  and  e99(q,p) 
where  e  is  the  empty  sequence  and  t(x)=q.  By  definition 
e69(qs,p)  and  x99(q,p)  iff  G (q , p) =g (x , p)  iff  x  is  an  optimal 
sequence. 

Induction  step:  Assume  that  the  lemma  holds  for  any 
sequence  of  length  <m  and  let  the  length  of  the  sequence  xa  be  m. 
xaee(q,p)  iff  (q ' ,p) 9T (q,p)  and  x69(q',p)  where  T(q',p)=q.  By 
induction  hypothesis  xG9(q',p)  iff  x  is  completely  optimal.  This 
implies  that  G (q ' ,p) =g (x,p)  .  Also  (q  '  ,p)  9T  (q  ,p)  iff 
G (q,p)=h (G (q1 ,p) ,q' ,a,p)  =  h (g (x , p) ,q * , a , p) =g (xa ,p) .  (xa  is  1- 
optimal  and  x  is  completely  optimal  -»xa  is  completely  optimal), 
i.e.,  xa  is  completely  optimal.   QED 

The  following  lemma  establishes  the  effective  equivalence  of  h 
and  h'  defined  by  (5)  in  the  sense  that  the  set  of  optimal 
sequences  obtained  for  each  state  is  the  same  for  both  cost  func- 
tions.  The  lemma  also  holds  true  for  h'  defined  by  equation  (7). 

Lemma  A.      Vq90,  VpGP  9(q,p)=9' (q,p) . 

proof:   x99(q,p)  iff  x  is  completely  optimal  (by  lemma  2), 

iff   x=a^a2...a   and  a^-.-aj  is  1-optimal  with  respect  to   h   for 

i=l,...,n 
iff  q'  (a1,p)=g(a1,p)=G'  (tia-L)  ,p)       and   ...   and   g  '  (ax  .  .  .  an  ,p) 

g(a^...a  ,p)  =  G  (t  (a^ • • • a  )  ,p)   by  construction, 
iff   x  is  completely  optimal  with  respect  to  h*, 
iff   x€9'  (q,p)   (by  lemma  2)  . 
QED 
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